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Abstract 

Previous compact representations of permutations have focused on adding a small index 
on top of the plain data (7r(l), 7r(2), . . . 7r(n)}, in order to efficiently support the application of 
the inverse or the iterated permutation. In this paper we initiate the study of techniques that 
exploit the compressibility of the data itself, while retaining efficient computation of ir(i) and 
its inverse. In particular, we focus on exploiting runs, which are subsets (contiguous or not) of 
the domain where the permutation is monotonic. Several variants of those types of runs arise in 
real applications such as inverted indexes and suffix arrays. Furthermore, our improved results 
on compressed data structures for permutations also yield better adaptive sorting algorithms. 

1 Introduction 

Permutations of the integers [l..n] = {1, . . . , n} are not only a fundamental mathematical struc- 
ture, but also a basic building block for the succinct encoding of integer functions [MR04J, 
strings |Kar99l IGMR06I IGV06I IANS06I IMN07I ICHSV08] . binary relations |BHMR,f)7j . and geo- 
metric grids [BLNS09J, among others. A permutation ir can be trivially encoded in n[lgn] bits, 
which is within 0(n) bits of the information theory lower bound of lg(n!) bits, where lgx = log 2 x 
denotes the logarithm in base two. 

In most of those applications, efficient computation is required for both the value 7r(i) at any 
point i E [l..n] of the permutation, and for the position 7r _1 (j) of any value j E [l..n] (i.e., the value 
of the inverse permutation). The only alternative we are aware of to storing explicitly both tt and 
7T _1 is by Munro et al. [MRRR03], who add a small structure over the plain representation of tt so 
that, by spending elgn extra bits, any vr" 1 ^') can be computed in time 0(l/e). This is extended 
to any positive or negative power of 7T, ir k (i). They give another solution using 0{n) extra bits and 
computing any vr fc (j) in time C(lgn/lglgn). 

The lower bound of ig(n!) bits yields a lower bound of il(nlgn) comparisons to sort such a 
permutation in the comparison model, in the worst case over all permutations of n elements. Yet, 
a large body of research has been dedicated to finding better sorting algorithms which can take ad- 
vantage of specificities of each permutation to sort. Some examples are permutations composed of 
a few sorted blocks |Man85] (e.g., (1 , 3, 5, 7, 9, 2, 4, 6, 8, 10) or (6, 7, 8, 9, 10 , 1, 2, 3, 4, 5)), or per- 
mutations containing few sorted subsequences |LP94] (e.g., (1 , 6, 2, 7, 3,8,4, 9, 5, 10)). Algorithms 
performing possibly o(nlgn) comparisons on such permutations, yet still O(nlgn) comparisons in 
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the worst case, are achievable and preferable if those permutations arise with sufficient frequency. 
Other examples are classes of permutations whose structure makes them interesting for applications: 
see the seminal paper of Mannila [Man85j, and the survey of Moffat and Petersson [MP92J for more 
details. 

Each sorting algorithm in the comparison model yields an encoding scheme for permutations: the 
result of all comparisons performed uniquely identifies the permutation sorted, and hence encodes it. 
Since an adaptive sorting algorithm performs o(nlgn) comparisons on a class of "easy" permutations, 
each adaptive algorithm yields a compression scheme for permutations, at the cost of losing a 
constant factor on the complementary class of "hard" permutations. Yet such compression schemes 
do not necessarily support efficiently the computation of value 7r _1 (j) of the inverse permutation 
for an arbitrary value j G [l.-ra], or even the simple application of the permutation, 7r(i). 

This is the topic of our study: the interplay between adaptive sorting algorithms and compressed 
representation of permutations that support efficient application of and 7r _1 (j). In particular 
we focus on classes of permutations that can be decomposed into a small number of runs, that is, 
monotone subsequences of ir, either contiguous or not. 

Our results include compressed representations of permutations whose space and time to com- 
pute any 7r(i) and vr" 1 ^') are proportional to the entropy of the distribution of the sizes of the 
runs. As far as we know, this is the first compressed representation of permutations with similar 
capabilities. 

We also develop the corresponding sorting algorithms, which in general refine the known com- 
plexities to sort those classes of permutations: While there exist sorting algorithms taking advantage 
of the number of runs of various kinds, ours take advantage of their size distribution and are strictly 
better (or equal, at worst). 

Finally, we obtain a representation for strings that improves upon the state of the art |FMMN07| 
GRR08J in the average case, while retaining their space and worst-case performance for operations 
access, rank, and select. 

At the end of the article we describe some applications where the class of permutations com- 
pressible with the techniques we develop here naturally arise, and conclude with a more general 
perspective on the meaning of those results and the research directions they suggest. 

2 Basic Concepts and Previous Work 
2.1 Entropy 

We define the entropy of a distribution [CT91J, a measure that will be useful to evaluate compress- 
ibility results. 

Definition 1 The entropy of a sequence of positive integers X = (n\,n2, ■ ■ ■ ,n r ) adding up to n is 
T-L(X) = X/I=i 7fl§7T~- By concavity of the logarithm, it holds that (r — l)lgn < n%{X) < ralgr 
and that H((ni,n2, ■ ■ ■ n T )) > %((re 1 +n 2 , . . . , n r )). 

Here (m, ri2, . . . ,n r ) is a distribution of values adding up to n and %{X) measures how even is 
the distribution. 7i(X) is maximal (lg r) when all rij = n/r and minimal {^-^ lg n+ n ~^ +1 lg ) 
when they are most skewed {X = (1, 1, . . . , 1, n — r + 1)). 

This measure is related to entropy of random variables and of sequences as follows. If a 
random variable P takes the value i with probability ni/n, for 1 < i < r, then its entropy is 
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rt2, ■ ■ ■ , n r )). Similarly, if a string ^[l..n] contains i%i occurrences of character a, then its 
empirical zero-order entropy is rlo(S) = rl({ni,ri2, ■ ■ ■ ,n r )). 

T-L{X) is then a lower bound to the average number of bits needed to encode an instance of P, 
or to encode a character of S (if we model S statistically with a zero-order model, that is, ignoring 
the context of characters). 



2.2 Huffman Coding 

The Huffman algorithm [Huf52j receives frequencies X = (m, ri2, ■ ■ ■ , n r ) adding up to n, and 
outputs in 0(r lgr) time a prefix-free code for the symbols [l..r]. If t{ is the bit length of the code 
assigned to the zth symbol, then L = 'Y^^i n % is minimal. Moreover, L < n{l+T-L{X)). For example, 
given S'fL.ra] over alphabet [1-r], with symbol frequencies X, one can compress S by concatenating 
the codewords of the successive symbols S[i], achieving total length L < n(l + T-Lo(S)). (One also 
has to encode the usually negligible codebook of O(rlgr) bits.) 

Huffman's algorithm starts with a forest of r leaves corresponding to the frequencies 
{rii, ?7-2, . . . , n r }, and outputs a binary trie with those leaves, in some order. This so-called Huffman 
tree describes the optimal encoding as follows: The sequence of left /right choices (interpreted as 
0/1) in the path from the root to each leaf is the prefix-free encoding of that leaf, of length li equal 
to the leaf depth. 

A generalization of this encoding is multiary Huffman coding |Huf52j . in which the tree is given 
arity t, and then the Huffman codewords are sequences over an alphabet [l..t]. In this case the 
algorithm also produces the optimal code, of length L < n(l + / H{X)/\gt). 



2.3 Succinct Data Structures for Sequences 

Let 5*[l..n] be a sequence of symbols from the alphabet [l-.r]. This includes bitmaps when r = 2 
(where, for convenience, the alphabet will be {0, 1} rather than {1, 2}). We will make use of succinct 
representations of S that support the rank and select operators over strings and over binary vectors: 
rank c (S', i) gives the number of occurrences of c in 5*[l..i] and select c (S, j) gives the position in S 
of the jth occurrence of c. 

When r = 2, S requires n bits and rank and select can be supported in constant time using 
C(nlglgn/lgn) = o{n) bits on top of S |Mun96l iGoIIIH] . 

Raman et al. |RRR02] devised a bitmap representation that takes nl-Lo(S) + o(n) bits, while 
maintaining the constant time for supporting the operators. For the binary case Ho(S) is just 
m lg ^ + (n — m) lg n " m = m lg ^ + O(m), where m is the number of bits set to 1 in S. Golynski 
et al. |GGG + 07| reduced the o(n)-bits redundancy in space to C(nlglgn/lg 2 n). 

When m is much smaller than n, the o(n)-bits term may dominate. Gupta et al. [GHSV06J 
showed how to achieve space m lg ^ + 0{m lg lg — +lg n) bits, which largely reduces the dependence 
on n, but now rank and select are supported in O(lgm) time via binary search |Gup07 Theorem 
17 p. 153]. 

For larger alphabets, of size r = C(polylog(n)), Ferragina et al. [FMMN07] showed how to 
represent the sequence within riH^S) + o(nlgr) bits and support rank and select in constant 
time. Golynski et al. [GRR08, Lemma 9] improved the space to riH^S) + o(nlgr/lgn) bits while 
retaining constant times. 

Grossi et al. [GGV03J introduced the so-called wavelet tree, which decomposes an arbitrary 
sequence into several bitmaps. By representing the bitmaps in compressed form |GGG + 07 , the 
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overall space is riHo(S) + o(n) and rank and select are supported in time O(lgr). Multiary 
wavelet trees decompose the sequence into subsequences over a sublogarithmic-sized alphabet and 
reduce the time to 0(1 + lgr/lglgn) |FMMN071 [GRR08] . 

In this article n will generally denote the length of the permutation. All of our oQ expressions, 
even those including several variables, will be asymptotic in n. 

2.4 Measures of Presortedness in Permutations 

The complexity of adaptive algorithms, for problems such as searching, sorting, merging sorted 
arrays or convex hulls, is studied in the worst case over instances of fixed size and difficulty, for a 
definition of difficulty that is specific to each analysis. Even though sorting a permutation in the 
comparison model requires O(relgn) comparisons in the worst case over permutations of n elements, 
better results can be achieved for some parameterized classes of permutations. We describe some 
of those below, see the survey by Moffat and Petersson [MP92] for others. 

Knuth |Knu98| considered runs (contiguous ascending subsequences) of a permutation tt, 
counted by nRuns = 1 + \{i : 1 < % < n,ir(i + 1) < 7r(i)}|. Levcopoulos and Petersson [LP94J 
introduced Shuffled Up -Sequences and its generalization Shuffled Monotone Sequences, respectively 
counted by nSUS = min{& : tt is covered by k increasing subsequences}, and nSMS = min{fc : 
tt is covered by k monotone subsequences}. By definition, nSMS < nSUS < nRuns. 

Munro and Spira [MS76J took an orthogonal approach, considering the task of sorting multisets 
through various algorithms such as MergeSort, showing that they can be adapted to perform in 
time 0(re(l + H((mi, ■ ■ ■ ,m r )))) where m{ is the number of occurrences of i in the multiset (note 
this is totally different from our results, that depend on the distribution of the lengths of monotone 
runs). 

Each adaptive sorting algorithm in the comparison model yields a compression scheme for per- 
mutations, but the encoding thus defined does not necessarily support the simple application of the 
permutation to a single element without decompressing the whole permutation, nor the application 
of the inverse permutation. 

3 Contiguous Monotone Runs 

Our most fundamental representation takes advantage of permutations that are formed by a few 
monotone (ascending or descending) runs. 

Definition 2 A down step of a permutation tt over [l..n] is a position 1 < i < n such that 7r(i + l) < 
7r(i). An ascending run in a permutation tt is a maximal range of consecutive positions [i..j] that 
does not contain any down step. Let di,d,2, ■ ■ ■ , be the list of consecutive down steps in tt. Then the 
number of ascending runs of it is noted nRuns = k+l, and the sequence of the lengths of the ascending 
runs is noted vRuns = (m, rt2, • • • , ra nRuns ) ; where n\ = d\, = c?2 — d\, . . . , ra nRuns _i = d^ — d^i, 
and n n R Uns = n — dk- (If k = then nRuns = 1 and vRuns = (ni) = (n).) The notions of up step 
and descending run are defined similarly. 

For example, the permutation {1 , 3, 5, 7, 9, 2, 4, 6, 8, 10) contains nRuns = 2 ascending runs, of 
lengths forming the vector vRuns = (5, 5). 

We now describe a data structure that represents a permutation partitioned into nRuns ascending 
runs, and is able to compute any ir(i) and 7r _1 (i). 
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3.1 Structure 



Construction We find the down-steps of tt in linear time, obtaining nRuns runs of lengths vRuns = 
(ni, . . . ,^ nRuns ), and then apply the Huffman algorithm to the vector vRuns. When we set up the 
leaves v of the Huffman tree, we store their original index in vRuns, idx(y), and the starting 
position in tt of their corresponding run, pos(v). After the tree is built, we use idx(v) to compute 
a permutation (j) over [1.. nRuns] so that (f)(i) = j if the leaf corresponding to m is placed at the jth 
left-to-right leaf in the Huffman tree. We also compute 4>~ l . We also precompute a bitmap C[l..n] 
that marks the beginning of runs in tt and give constant-time support for rank and select. Since 
C contains only nRuns bits set out of n, it is represented in compressed form |GGG + 07] within 

nRunsl g^ + °( n ) bits - 

Now we set a new permutation tt' over [l..n] where the runs are written in the order given by 

4>~ l : We first copy from tt the run whose endpoints are those of the leftmost tree leaf, then the 

run pointed by the second leftmost leaf, and so on. Simultaneously, we compute pos'(v) for the 

leaves v, denoting the starting position of the area they cover in tt'. After creating tt' the original 

permutation tt can be deleted. We say that an internal node covers the contiguous area of tt' formed 

by concatenating the runs of all the leaves that descend from v. We compute, for all nodes v, 

pos'(v), the starting position of the area covered by v in tt', length(v), the size of that area, and 

leaves(v), the number of leaves that descend from v. 

Now we enhance the Huffman tree into a wavelet-tree-like structure [GGV03J without altering 

its shape, as follows. Starting from the root, first process recursively each child. For the leaves we 

do nothing. Once the left and right children, vi and v r , of an internal node v have been processed, 

the invariant is that the areas they cover have already been sorted. We create a bitmap for v, of size 

length{v). Now we merge the areas of v\ and v r in time O (length (v)). As we do the merging, each 

time we take an element from v\ we append a bit to the node bitmap, and a bit 1 when we take 

an element from v r . When we finish, tt' has been sorted and we can delete it. The Huffman-shaped 

wavelet tree (only with fields leaves and pos), (j), and C represent tt. 



Space and construction cost Note that each of the rij elements of leaf i (at depth l{) is merged 
li times, contributing £{ bits to the bitmaps of its ancestors, and thus the total number of bits in 
all bitmaps is ^ n^. Thus the total number of bits in the Huffman-shaped wavelet tree is at most 
n(l + %(vRuns)). Those bitmaps, however, are represented in compressed form |GGG + 07 , which 
allows us removing the n extra bits added by the Huffman encoding. 

Let us call rrij = n^-iQ^ the length of the run corresponding to the jth left-to-right leaf, and 
rriij = m,i + . . .+rrij. The compressed representation |GGG + 07| takes, on a bitmap of length n and m 
Is, mlg ^ + (n — m) lg w " m bits, plus a redundancy of C(nlglgn/lg 2 n) bits. We prove by induction 
(see also Grossi et al. [GGV03J) that the compressed space allocated for all the bitmaps descending 
from a node covering leaves [i..k] is Yli<r<k m r lg ~jtr~ ( we cons ider the redundancy later). Consider 
two sibling leaves merging two runs of m« and mj+i elements. Their parent bitmap contains m, 
Os and m^i Is, and thus its compressed representation requires rrii lg m '~^' +1 + mj + i lg m ' r ^" 1 ^ +1 
bits. Now consider a general Huffman tree node merging a left subtree covering leaves and 
a right subtree covering leaves [j + l..k]. Then the bitmap of the node will be compressed to 
rrii j lg ^Sh. _|_ m . j , \g m '' fc bits. By the inductive hypothesis, all the bitmaps on the left child and 

its subtrees add up to Yli<r<j m r lg 1^7> an d those on the right add up to Y,j+i< r <k m r lg ^ml' 1 * ■ 
Adding up the three formulas we get the inductive thesis. 
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Therefore, a compressed representation of the bitmaps requires n%(vRuns) bits, plus the redun- 
dancy. The latter, added over all the bitmaps, is 0(n(l + "H(vRuns)) lg lg nj lg 2 n) = o(n) because 
%(vRuns) < lgn^J To this we must add the 0(nRunslgn) bits of the tree pointers and extra data 
like pos and leaves, the (nRuns lg nRuns) bits for cj>, and the nRuns lg nR " ns + o(n) bits for C. 

The construction time is 0(nRunslg nRuns) for the Huffman algorithm, plus O(nRuns) for com- 
puting (f) and filling the node fields like pos and leaves, plus 0(n) for constructing it' and C, plus the 
total number of bits appended to all bitmaps, which includes the merging cost. The extra structures 
for rank are built in linear time on those bitmaps]! All this adds up to 0(n(l+%(vRuns))), because 
nRuns lg nRuns < n%(vRuns) +lgn by concavity, recall Definition [T] 

3.2 Queries 

Computing it and 7r _1 One can regard the wavelet tree as a device that tracks the evolution of 
a merge-sorting of n', so that in the bottom we have (conceptually) the sequence ir' (with one run 
per leaf) and in the top we have (conceptually) the sorted permutation (1,2,..., n). 

To compute vr _1 (j) we start at the top and find out where that position came from in tt'. We 
start at offset j' = j of the root bitmap B. If B[j'] = 0, then position f came from the left subtree 
in the merging. Thus we go down to the left child with f <— ranko(-B, j'), which is the position of 
j' in the array of the left child before the merging. Otherwise we go down to the right child with 
j' <— ranki(l?, j'). We continue recursively until we reach a leaf v. At this point we know that j 
came from the corresponding run, at offset j', that is, vr" 1 ^') = pos(v) + j' — 1. 

To compute ir(i) we do the reverse process, but we must first determine the leaf v and offset 
i' within v corresponding to position i: We compute I = (/>(ranki(C, z)), so that i falls at the Ith 
left-to-right leaf. Then we traverse the Huffman tree down so as to find the Ith leaf. This is easily 
done as we have leaves(v) stored at internal nodes. Upon arriving at leaf v, we know that the offset 
is i' = i — pos(v) + 1. We now start an upward traversal from v using the nodes that are already 
in the recursion stack. If v is a left child of its parent u, then we set i' <— selecto(-B, 2') to locate 
it in the merged array of the parent, else we set i' <— select i(B,i'), where B is the bitmap of u. 
Then we set v <— u and continue until reaching the root, where we answer 7r(i) = i! . 

Query time In both queries the time is 0(£), where I is the depth of the leaf arrived at. If i is 
chosen uniformly at random in [l..n], then the average cost is ^ ^ n^i = 0(l+%(vRuns)). However, 
the worst case can be O (nRuns) in a fully skewed tree. We can ensure £ = C(lg nRuns) in the worst 
case while maintaining the average case by slightly rebalancing the Huffman tree |ML01| . Given any 
constant x > 0, the height of the Huffman tree can be bound to at most (1 +x) lg nRuns so that the 
total number of bits added to the encoding is at most n •nRuns -a;lg ' / ', where (p ~ 1.618 is the golden 
ratio. This is o(n) if nRuns = w(l), and otherwise the cost was O(nRuns) = 0(1) anyway. Similarly, 
the average time stays 0(1 + "H(vRuns)), as it increases at most by ©(nRuns - ^ 1 ^) = 0(1). This 
rebalancing takes just 0(nRuns) time if the frequencies are already sorted. 

Note also that the space required by the query is 0(lg nRuns). This can be made constant by 
storing parent pointers in the wavelet tree, which does not change the asymptotic space. 

lr To make sure this is o(n) even if there are many short bitmaps, we can concatenate all the bitmaps into a single 
one, and replace pointers to bitmaps by offsets to this single bitmap. Operations rank and select translate easily 
into a concatenated bitmap. 

2 While the linear construction time is not obvious from their article |GGG + 07] . a subsequent result |P08j achieved 
even less redundancy and linear construction time. 
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Theorem 1 There is an encoding scheme using at most n%(vRuns) + 0(nRuns lgra) + o(n) bits to 
represent a permutation tt over [l..n] covered by nRuns contiguous ascending runs of lengths forming 
the vector vRuns. It can be built within time 0(n(l + %(vRuns))) ; and supports the computation 
of Tr(i) and 7r -1 (z) in time 0(1 +lgnRuns) for any value of i £ [l..n]. If i is chosen uniformly at 
random in [l..n] then the average computation time is 0(1 + %(vRuns)). 

We note that the space analysis leading to n%(vRuns) + o(n) bits works for any tree shape. 
We could have used a balanced tree, yet we would not achieve 0(1 + 'H(vRuns)) average time. On 
the other hand, by using Hu- Tucker codes instead of Huffman, as in our previous work |BN09| . we 
would not need the permutation <p and, by using compact tree representations [SN10J, we would 
be able to reduce the space to n^(vRuns) + 0(nRunslg " ) + o(n). This is interesting for large 
values of nRuns, as it is always n%(vRuns) + o(n(l + %(vRuns)) even if nRuns = 6(n)Jl 

3.3 Mixing Ascending and Descending Runs 

We can easily extend Theorem [1] to mix ascending and descending runs. 

Corollary 2 Theorem^ holds verbatim if tt is partitioned into a sequence nRuns contiguous mono- 
tone (i.e., ascending or descending) runs of lengths forming the vector vRuns. 

Proof. We mark in a bitmap of length nRuns whether each run is ascending or descending, and 
then reverse descending runs in tt, so as to obtain a new permutation TT asc , which is represented 
using Theorem [T] (some runs of tt could now be merged in TT asc , but this only reduces %(vRuns), 
recall Definition [TJ . 

The values tt(i) and TT~ l {j) are easily computed from TT asc : If vr~ s ^.(j) = i, we use C to determine 
that i is within run TT asc (£..r), that is, £ = selecti(ranki(0, i)) and r = selecti(ranki(C, i) + l) — 1. 
If that run is reversed in tt, then tt~ 1 {j) = I + r — i, else TT~ l {j) = i. For tt{i), we use C to 
determine that i belongs to run ir(£..r). If the run is descending, then we return TT asc {£ + r — i), 
else we return TT asc (i). The operations on C require only constant time. The extra construction 
time is just 0{n), and no extra space is needed apart from nRuns = o(nRunslgn) bits. □ 

Note that, unlike the case of ascending runs, where there is an obviously optimal way of parti- 
tioning (that is, maximize the run lengths), we have some freedom when partitioning into ascending 
or descending runs, at the endpoints of the runs: If an ascending (resp. descending) run is followed 
by a descending (resp. ascending) run, the limiting element can be moved to either run; if two as- 
cending (resp. descending) runs are consecutive, one can create a new descending (resp. ascending) 
run with the two endpoint elements. While finding the optimal partitioning might not be easy, we 
note that these decisions cannot affect more than O (nRuns) elements, and thus the entropy of the 
partition cannot be modified by more than (nRuns lgn), which is absorbed by the redundancy of 
our representation. 



3 We do not follow this path because we are more interested in multiary codes (see Section f3.5|) and, to the best 
of our knowledge, there is no efficient (i.e., O (nRuns lg nRuns) time) algorithm for building multiary Hu- Tucker codes 
|Knu98| . 
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3.4 Improved Adaptive Sorting 



One of the best known sorting algorithms is MergeSort, based on a simple linear procedure to merge 
two already sorted arrays, and with a worst case complexity of n[~lgn] comparisons and O(nlgn) 
running time. It had been already noted [Knu98j that finding the down-steps of the array in linear 
time allows improving the time of MergeSort to C(n(l + IgnRuns)) (the down-step concept can be 
applied to general sequences, where consecutive equal values do not break runs). 

We now show that the construction process of our data structure sorts the permutation and, 
applied on a general sequence, it achieves a refined sorting time of 0(n(l + 'H(vRuns)) C 0(n(l + 
IgnRuns)) (since %(vRuns) < IgnRuns). 



Theorem 3 There is an algorithm sorting an array of length n covered by nRuns contiguous mono- 
tone runs of lengths forming the vector vRuns in time 0(n(l + %(vRuns))), which is worst-case 
optimal in the comparison model. 

Proof. Our wavelet tree construction of Theorem [1] (and Corollary |2|) indeed sorts tt within this 
time, and it also works if the array is not a permutation. This is optimal because, even con- 
sidering just ascending runs, there are n \ n \ n ^ n — r different permutations that can be covered 
with runs of lengths forming the vector vRuns = (ni,n2, ■ ■ ■ ,n nRuns ). Thus lg ni \ n2 \ n ' nnR — r com- 
parisons are necessary. Using Stirling's approximation to the factorial we have lg ra i n "' n - — r 
1 1) + 1/2) lgn — Yli( n i + 1/2) lg tt,^ — O(lgnRuns). Since ^lgnj < nRuns lg(n/nRuns), this is 
n^(vRuns) — 0(nRuns lg(n/nRuns)) = n%(vRuns) — 0(n). The term O(n) is also necessary to 
read the input, hence implying a lower bound of 0(rt(l + 'H(vRuns))). 

Note, however, that the set of permutations that can be covered with nRuns runs of lengths 
vRuns, may contain permutations that can be covered with fewer runs (as two consecutive runs 
could be merged), and thus they have entropy less than %(vRuns), recall Definition [1] We 
have proved that the lower bound applies to the union of two classes: one (1) contains (som 
permutations of entropy 'H(vRuns) and the other (2) contains (some) permutations of entropy less 
than %(vRuns). Obviously the bound does not hold for class (2) alone, as we can sort it in less 
time. Since we can tell the class of a permutation in 0(n) time by counting the down-steps, it 
follows that the bound also applies to class (1) alone (otherwise 0{n) + o(n%(vRuns)) would be 
achievable for (l) + (2)). □ 



3.5 Boosting Time Performance 

The time performance achieved in Theorem [T] (and Corollary [5]) can be boosted by an O (lg lg n) 
time factor by using Huffman codes of higher arity. 

Given the run lengths vRuns, we build the t-ary Huffman tree for vRuns, with t = \/lgn. Since 
now we merge t children to build the parent, the sequence stored in the parent to indicate the 
child each element comes from is not binary, but over alphabet [l..t]. In addition, we set up nRuns 
pointers to provide direct access to the leaves, and parent pointers. 

The total length of all the sequences stored at all the Huffman tree nodes is < n(l + 
%(vRuns)/ lg t) |Huf52] , To reduce the redundancy, we represent each sequence S[l..m] stored 

4 Other permutations with vectors distinct from vRuns could also have entropy "H(vRuns). 
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at a node using the compressed representation of Golynski et al. [GRR08, Lemma 9], which yields 
space mT-Lo(S) + 0(m lg t lglg raj lg 2 m) bits. 

For the string 5[l..m] corresponding to a leaf covering run lengths mi,...,mt, we have 
rriHo(S) = X^ m i^S^- From there we can carry out exactly the same analysis done in Sec- 
tion 13.11 for binary trees, to conclude that the sum of the mHo(S) bits for all the strings 
S over all the tree nodes is ra%(vRuns). On the other hand, the redundancies add up to 
0(n(l + ^(vRuns)/lgt)lgtlglgn/lg 2 n) = o(n) bitsJl 

The advantage of the t-ary representation is that the average leaf depth is 1 + %(vRuns)/lgt = 
0(1 + %(vRuns)/lglgn). The algorithms to compute and 7r — are similar, except that rank 
and select are carried out on sequences S over alphabets of size yTgn. Those operations can still 
be carried out in constant time on the representation we have chosen |GRR08| . The only detail 
is that, for ir(i) we first moved from the root to the leaf using the field leaves(v). This does not 
anymore allow us processing a node in constant time, and thus we have opted for storing an array 
of pointers to the leaves and parent pointers. 

For the worst case, if nRuns = w(l), we can again limit the depth of the Huffman tree to 

(lg nRuns/ lglgn) and maintain the same average time. The multiary case is far less understood 
than the binary case. Recently, an algorithm to find the optimal length-restricted t-ary code has been 
presented whose running time is linear once the lengths are sorted [Bae07j. To analyze the increase 
in redundancy, consider the sub-optimal method that simply takes any node v of depth more than 

1 = 4 lg nRuns / lg t and balances its subtree (so that height 5 lg nRuns / lg t is guaranteed) . Since any 
node at depth I covers a total length of at most n/t^/ 2 ! (see next paragraph), the sum of all the 
lengths covered by these nodes is at most nRuns • n/i^/ 2 J . By forcing those subtrees to be balanced, 
the average leaf depth increases by at most (lg nRuns/ lg t) nRuns/i^/ 2 J < lg(nRuns)/(nRuns lgi) = 
0(1). Hence the worst case is limited to 0(1 + lg nRuns/ lg lg n) while the average case stays within 
0(l+%(vRuns)/lglgn). For the space we need a finer consideration: As nRuns = uj(1), the increase 
in average leaf depth is o(l/lgt). Since increasing by one the depth of a leaf covering m elements 
costs mlgt further bits, the total increase in space redundancy is o(n). 

The limit on the probability is obtained as follows. Consider a node v in the i-ary Huffman 
tree. Then lengthen) > length{v) for any uncle u of v, as otherwise switching v and u improves 
the already optimal Huffman tree. Hence w, the grandparent of v (i.e., the parent of u) must cover 
an area of size length{w) > t ■ length{v). Thus the covered length is multiplied at least by t when 
moving from a node to its grandparent. Conversely, it is divided at least by t as we move from a 
node to any grandchild. As the total length at the root is n, the length covered by any node v at 
depth I is at most length(y) < n/i^/ 2 J. 

This yields our final result for contiguous monotone runs. 

Theorem 4 There is an encoding scheme using at most n^(vRuns) + 0(nRuns lgra) + o(n) bits to 
encode a permutation tt over [l..n] covered by nRuns contiguous monotone runs of lengths forming the 
vector vRuns. It can be built within time 0(n(l+%(vRuns)/lglgra)) ; and supports the computation 
ofir(i) and ir -1 (i) in time 0(1 + lg nRuns/ lglgra) for any value of i £ [l..n]. If i is chosen uniformly 
at random in [l..n] then the average computation time is 0(1 + %(vRuns)/lglgra). 

The only missing part is the construction time, since now we have to build strings #[l..m] by 
merging t increasing runs. This can be done in 0(m) time by using atomic heaps [FW94]. The 
compressed sequence representations are built in linear time [GRR08J. Note this implies that we 

5 Again, we can concatenate all the sequences to make sure this redundancy is asymptotic in n. 
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can sort an array with nRuns contiguous monotone runs of lengths forming the vector vRuns in time 
0(n(l + "H(vRuns)/ lg lg n)), yet we are not anymore within the comparison model. 

3.6 An Improved Sequence Representation 

Interestingly, the previous result yields almost directly a new representation of sequences that, 
compared to the state of the art [FMMN07, GRR08J, provides improved average time performance. 

Theorem 5 Given a string S[l..n] over alphabet [l..o~] with zero-order entropy 710(8), there is an 
encoding for S using at most uHq(S) + O(o lg n) + o(n) bits and answering queries S[i], rank c (S', i) 
and select c (S,i) in time 0(1 + lg <r/ lg lg n) for any c £ [l..cr] and i G [l..n]. When i is chosen at 
random in query S[i], or c is chosen with probability n c /n in queries rank c (S, i) and select c (S, i), 
where n c is the frequency of c in S, the average query time is 0(1 + 7Lo(S) / Iglgn) . 

Proof. We build exactly the same i-ary Huffman tree used in Theorem 2] using the frequencies 
n c instead of run lengths. The sequences at each internal node are formed so as to indicate how 
the symbols in the child nodes are interleaved in S. This is precisely a multiary Huffman-shaped 
wavelet tree [GGV03, FMMN07J, and our previous analysis shows that the space used by the tree 
is exactly as in Theorem UJ where now the entropy is 7io(S) = Yuc^n 'Sp The three queries are 
solved by going down or up the tree and using rank and select on the sequences stored at the 
nodes |GGV03| IFMMN07] . Under the conditions stated for the average case, one arrives at the leaf 
of symbol c with probability n c /n, and then the average case complexities follow. □ 



4 Strict Runs 

Some classes of permutations can be covered by a small number of runs of a stricter type. We 
present an encoding scheme that take advantage of them. 

Definition 3 A strict ascending run in a permutation it is a maximal range of positions satisfying 
ir(i + k) = ir(i) + k. The head of such run is its first position. The number of strict ascending runs 
of tt is noted nSRuns, and the sequence of the lengths of the strict ascending runs is noted vSRuns. 
We will call vHRuns the sequence of contiguous monotone run lengths of the sequence formed by the 
strict run heads of it. Similarly, the notion of a strict descending run can be defined, as well as that 
of strict (monotone) run encompassing both. 

For example, the permutation (6, 7, 8, 9, 10, 1, 2, 3, 4, 5) contains nSRuns = 2 strict runs, of 
lengths vSRuns = (5, 5). The run heads are (6, 1), which form 1 monotone run, of lengths vHRuns = 
(2). Instead, the permutation (1,3,5, 7, 9, 2, 4, 6, 8, 10) contains nSRuns = 10 strict runs, each of 
length 1. 

Theorem 6 Assume there is an encoding P for a permutation over [l..n] with nRuns contigu- 
ous monotone runs of lengths forming the vector vRuns, which requires s(n, nRuns, vRuns) bits of 
space and can apply the permutation and its inverse in time t(n, nRuns, vRuns). Now consider a 
permutation it over [l..n] covered by nSRuns strict runs and by nRuns < nSRuns monotone runs, 
and let vHRuns be the vector formed by the nRuns monotone run lengths in the permutation of 
strict run heads. Then there is an encoding scheme using at most s(nSRuns, nRuns, vHRuns) + 
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0(nSRuns lg nSI ^ ns ) + o{ri) bits for tt. It can be computed in 0(n) time on top of that for building P. 
It supports the computation of tt(i) and 7r — 1 (z) in time 0(i(nSRuns, nRuns, vHRuns)) for any value 
i G [l..n]. 

Proof. We first set up a bitmap R of length n marking with a 1 bit the beginning of the strict runs. 
We set up a second bitmap R inv such that R inv [i] = i^Tr -1 (i)]. Now we create a new permutation 
7r' over [L.nSRuns] which collapses the strict runs of tt, n'(i) = ra.nki(R mv ,tt (select i(R,i))). All 
this takes 0(n) time and the bitmaps take 2nSRuns lg nSI ^ ]ns + O(nSRuns) + o(n) bits in compressed 
form |GGG + 07| . where rank and select are supported in constant time. 

Now we build the structure P for tt' . The number of monotone runs in tt is the same as for the 
sequence of strict run heads in tt, and in turn the same as the runs in tt'. So the number of runs 
in tt' is also nRuns and their lengths are vHRuns. Thus we require s(nSRuns, nRuns, vHRuns) further 
bits. 

To compute tt(i), we find i! <— ranki(.R, i) and then compute f <— Tr'(i'). The final answer is 
selecti(R mv , j') + i — selecti(i?, i 1 ). To compute 7r -1 (j), we find j' <— raiLki(R mv , j) and then 
compute i' ^— (-7r / ) _1 (j / ). The final answer is selecti(i?, i') + j — select \(R mv ,j'). The struc- 
ture requires only constant time on top of that to support the operator 7r'() and its inverse tt'~ 1 () ■ □ 

The theorem can be combined with previous results, for example Theorem HJ in order to obtain 
concrete data structures. This representation is interesting because its space could be much less 
than n if nSRuns is small enough. However, it still retains an o(n) term that can be dominant. 
The following corollary describes a compressed data structure where the o(n) term is significantly 
reduced. 

Corollary 7 The o(n) term in the space of Theorem^ can be replaced by O (nSRuns lg lg rgjp— ; +lg n ) 
at the cost of 0(1 + IgnSRuns) extra time for the queries. 

Proof. Replace the structure of Golynski et al. |GGG + 07| by the binary searchable gap encoding of 
Gupta et al. [G HSV06] . which takes 0(1 + IgnSRuns) t ime for rank and select (recall Section f2 . 3 1) . 
□ 

Other tradeoffs for the bitmap encodings are possible, such as the one described by 
Gupta |Gup07[ Theorem 18 p. 155]. 

5 Shuffled Sequences 

Up to now our runs have been contiguous in tt. Levcopoulos and Petersson |LP94] introduced 
the more sophisticated concept of partitions formed by interleaved runs, such as Shuffled UpSe- 
quences (SUS) and Shuffled Monotone Sequences (SMS). We now show how to take advantage of 
permutations formed by shuffling (interleaving) a small number of runs. 

Definition 4 A decomposition of a permutation tt over [l..n] into Shuffled UpSequences is a set of 
not necessarily consecutive, subsequences of increasing numbers that have to be removed from tt in 
order to reduce it to the empty sequence. The number of shuffled upsequences in such a decomposition 
of tt is noted nSUS, and the vector formed by the lengths of the involved shuffled upsequences, in 
arbitrary order, is noted vSUS. When the subsequences can be of increasing or decreasing numbers, 
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we call them Shuffled Monotone Sequences, call nSMS their number and vSMS the vector formed by 
their lengths. 

For example, the permutation (1,6,5,7,3,8,^,9,5,10) contains nSUS = 2 shuffled upse- 
quences of lengths forming the vector vSUS = (5,5), but nRuns = 5 runs, all of length 2. In- 
terestingly, we can reduce the problem of representing shuffled sequences to that of representing 
strings and contiguous runs. 

5.1 Reduction to Strings and Contiguous Monotone Sequences 

We first show how a permutation with a small number of shuffled monotone sequences can be 
represented using strings over a small alphabet and permutations with a small number of contiguous 
monotone sequences. 

Theorem 8 Assume there exists an encoding P for a permutation over [l..n] with nRuns contiguous 
monotone runs of lengths forming the vector vRuns, which requires s(n, nRuns, vRuns) bits of space 
and supports the application of the permutation and its inverse in time t(n, nRuns, vRuns). Assume 
also that there is a data structure S for a string S[l..n] over an alphabet of size nSMS with symbol 
frequencies vSMS, using s'(n, nSMS, vSMS) bits of space and supporting operators rank, select, and 
access to values S[i], in time t'(n, nSMS, vSMS). Now consider a permutation tt over [l..n] covered 
by nSMS shuffled monotone sequences of lengths vSMS. Then there exists an encoding of ir using 
at most s(n,nSMS, vSMS) + s'(n, nSMS, vSMS) + O(nSMSlg^g) + o(n) bits. Given the covering into 
SMSs, the encoding can be built in time 0{n), in addition to that of building P and S. It supports 
the computation of n(i) and 7r — 1 («) in time t(n, nSMS, vSMS) + t'(n, nSMS, vSMS) for any value of 
i £ [l..n]. The result is also valid for shuffled upsequences, in which case P is just required to handle 
ascending runs. 

Proof. Given the partition of tt into nSMS monotone subsequences, we create a string S[l..n] over 
alphabet [L.nSMS] that indicates, for each element of tt, the label of the monotone sequence it 
belongs to. We encode S*[l..n] using the data structure S. We also store an array A[l..nSMS] so that 
A[£] is the accumulated length of all the sequences with label less than £. 

Now consider the permutation tt' formed by the sequences taken in label order: tt' can be covered 
with nSMS contiguous monotone runs vSMS, and hence can be encoded using s(n, nSMS, vSMS) 
additional bits using P. This supports the operators tt'Q and 7r /_1 () in time t(n, nSMS, vSMS) 
(again, some of the runs could be merged in tt' , which only improves time and space in P). Thus 
7r(i) = 7r'(A[5[i]] + rank 5 [j](5, i)) can be computed in time t(n, nSMS, vSMS) + t'(n, nSMS, vSMS). 
Similarly, vr" 1 ^) = select^S, (tt')" 1 ^)- A[£\), where £ is such that A[£] < (7r') -1 (i) < can 
also be computed in time t(n, nSMS, vSMS) + t'(n, nSMS, vSMS), plus the time to find £. The latter is 
reduced to constant by representing A with a bitmap A'[l..n] with the bits set at the values A[£] + 1, 
so that A[£] = selecti(A', £) — 1, and the binary search is replaced by £ = ranki(A', (7r') — 1 (*)) - 
With the structure of Golynski et al. |GGG+07] . A' uses C(nSMS lg -g^g) + o(n) bits and operates 
in constant time. □ 

We will now obtain concrete results by using specific representations for P and S, and specific 
methods to find the decomposition into shuffled sequences. 
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5.2 Shuffled UpSequences 

Given an arbitrary permutation, one can decompose it in linear time into contiguous runs in order 
to minimize %(vRuns), where vRuns is the vector of run lengths. However, decomposing the same 
permutation into shuffled up (resp. monotone) sequences so as to minimize either nSUS or %(vSUS) 
(resp. nSMS or %(vSMS)) is computationally harder. 

Fredman |Fre75| gave an algorithm to compute a partition of minimum size nSUS, into upse- 
quences, claiming a worst case complexity of O(nlgn). Even though he did not claim it at the 
time, it is easy to observe that his algorithm is adaptive in nSUS and takes 0(n(\ + IgnSUS)) 
time. We give here an improvement of his algorithm that computes the partition itself within time 
0(n(l + %(vSUS))), no worse than the time of his original algorithm, as %(vSUS) < IgnSUS. 

Theorem 9 If an array D[l..n] can be optimally covered by nSUS shuffled upsequences (equal values 
do not break an upsequence) , then there is an algorithm finding a covering of size nSUS in time 
0(n(l + H(vSUS))) C <D(n(l + IgnSUS)), where vSUS is the vector formed by the lengths of the 
upsequences found. 

Proof. Initialize a sequence Si = (Z)[l]), and a splay tree T |ST85| with the node (Si), ordered by 
the rightmost value of the sequence contained by each node. For each further array element D[i], 
search for the sequence with the maximum ending point no larger than D[i]. If it exists, add D[i] 
to this sequence, otherwise create a new sequence and add it to T. 

Fredman |Fre75| already proved that this algorithm finds a partition of minimum size nSUS. 
Note that, although the rightmost values of the splay tree nodes change when we insert a new 
element in their sequence, their relative position with respect to the other nodes remains the same, 
since all the nodes at the right hold larger values than the one inserted. This implies in particular 
that only searches and insertions are performed in the splay tree. 

A simple analysis, valid for both the plain sorted array in Fredman's proof and the splay tree of 
our own proof, yields an adaptive complexity of 0(n(\ + IgnSUS)) comparisons, since both structures 
contain at most nSUS elements at any time. The additional linear term (relevant when nSUS = 1) 
corresponds to the cost of reading each element once. 

The analysis of the algorithm using the splay tree refines the complexity to C(n(l +%(vSUS))), 
where vSUS is the vector formed by the lengths of the upsequences found. These lengths correspond 
to the frequencies of access to each node of the splay tree, which yields the total access time of 
0(n(l + ft(vSUS))) (ST851 Theorem 2]. □ 

The theorem obviously applies to the particular case where the array is a permutation. For per- 
mutations and, in general, integer arrays over a universe [l..m], we can deviate from the comparison 
model and find the partition within time O(nlglgm), by using y-fast tries |Wil83| instead of splay 
trees. 

We can now give a concrete representation for shuffled upsequences. The complete description 
of the permutation requires to encode the computation the partitioning and of the comparisons 
performed by the sorting algorithm. This time the encoding cost of partitioning is as important as 
that of merging. 

Theorem 10 Let it be a permutation over [l..n] that can be optimally covered by nSUS shuffled upse- 
quences, and let vSUS be the vector formed by the lengths of the decomposition found by the algorithm 
of Theorem^ Then there is an encoding scheme for ir using at most 2re%(vSUS)+0(nSUS lg n)+o(n) 
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bits. It can be computed in time 0(n(l+%(vSUS))), and supports the computation ofir(i) and 7T _1 (i) 
in time 0(1 + IgnSUS/ lg lg n) for any value of i £ [l..n]. If i is chosen uniformly at random in 
[l..n] the average query time is 0(1 + %(vSUS)/lglgn). 

Proof. We first use Theorem [9] to find the SUS partition of optimal size nSUS, and the corre- 
sponding vector vSUS formed by the sizes of the subsequences of this partition. Then we apply 
Theorem |8] For the data structure S we use Theorem |5j whereas for P we use Theorem |4] Note 
"H(vSUS) is both Hq(S) and %(vRuns) for permutation ir'. The result follows immediately. □ 

One would be tempted to consider the case of a permutation it covered by nSUS upsequences 
which form strict runs, as a particular case. Yet, this is achieved by resorting directly to Theorem [4] 
The corollary extends verbatim to shuffled monotone sequences. 

Corollary 11 There is an encoding scheme using at most nH(vSUS) + O(nSUSlgn) + o(n) bits to 
encode a permutation it over [l..n] optimally covered by nSUS shuffled upsequences, of lengths forming 
the vector vSUS, and made up of strict runs. It can be built within time 0(n(l + %(vSUS)/ lg lg n)), 
and supports the computation of 7r(z) and 7T" 1 (?) in time 0(1 + lgnSUS/lglgn) for any value of 
i G [l..n]. If i is chosen uniformly at random in [l..n] then the average query time is 0(1 + 
ft(vSUS)/lg lgn). 

Proof. It is sufficient to invert tt and represent ir~ 1 using Theorem [4j since in this case 7r _1 is 
covered by nSUS ascending runs of lengths forming the vector vSUS: If iq < i\ . . . < i m forms a 
strict upsequence, so that 7r(%) = vr(io) + t, then calling jo = 7r(«o) we have the ascending run 
^(jo +t)=i t for < t < m. □ 

Once more, our construction translates into an improved sorting algorithm, improving on the 
complexity 0(n(l + IgnSUS)) of the algorithm by Levcopoulos and Petersson [LP94J. 

Corollary 12 We can sort an array of length n, optimally covered by nSUS shuffled upsequences, 
in time 0(n(l + %(vSUS))) ; where vSUS are the lengths of the decomposition found by the algorithm 
of Theorem^ 

Proof. Our construction in Theorem 1 1 1 finds and separates the subsequences of 7r, and sorts them, 
all within this time (we do not need to build the string S). □ 

Open problem Note that the algorithm of Theorem [9] finds a partition of minimal size nSUS (this 
is what we refer to with "optimally covered"), but that the entropy %(vSUS) of this partition is not 
necessarily minimal: There could be another partition, even of size larger than nSUS, with lower 
entropy. Our results are only in function of the entropy of the partition of minimal size nSUS found. 
This is unsatisfactory, as the ideal would be to speak in terms of the minimum possible 'H(vSUS), 
just as we could do for %(vRuns). 

An example, consider the permutation (1, 2, . . . , n/2— 1, n, n/2, n/2+1, . . . , n— 1), 
for some even integer n. The algorithm of Theorem [9] yields the partition 

{(1,2,... ,n/2-l,n), (n/2, n/2+1,..., n-1)} of entropy %((n/2, n/2)) = nlg2 = n. This is 
suboptimal, as the partition {(1, 2, . . . , n/2— 1, n/2, n/2+1, . . . , n— l),(n)} is of much smaller 
entropy, U((n-1, 1)) = (n - 1) lg ^ + lgn = 0(lgn). 
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On the other hand, a greedy online algorithm cannot minimize the entropy of a SUS partition- 
ing. As an example consider the permutation (2, 3, . . . , n/2, 1, n, n/2+1, . . . , n— 1), for some even 
integer n. A greedy online algorithm that after processing a prefix of the sequence minimizes the 
entropy of such prefix, produces the partition {(1, n/2+1, . . . , n— 1), (2, 3, . . . , n/2, n)}, of size 2 and 
entropy %((n/2,n/2)) = n. However, a much better partition is {(1, n), (2, 3, . . . , n— 1)}, of size 2 
and entropy %{{2,n — 2)) = O(lgn). 

We doubt that the SUS partition minimizing %(vSUS) can be found within time C(n(l + 
■%(vSUS))) or even C(n(l + IgnSUS)). Proving this right or wrong is an open challenge. 

5.3 Shuffled Monotone Sequences 

No efficient algorithm is known to compute the minimum number nSMS of shuffled monotone se- 
quences composing a permutation, let alone finding a partition minimizing the entropy %(vSMS) of 
the lengths of the subsequences. The problem is NP-hard, by reduction to the computation of the 
"cochromatic" number of the graph corresponding to the permutation [KSW96]. 

Yet, should such a partition into monotone subsequences be available, and be of smaller entropy 
than the partitions considered in the previous sections, this would yield an improved encoding by 
doing just as in Theorem 1101 for SUS. 

Note that it takes a difference by a superpolynomial margin between the values of nSUS and 
nSMS to yield a noticeable difference between IgnSUS and IgnSMS, and hence between the values 
of "H(vSUS) and 'H(vSMS). It seems unlikely that such a difference would justify the difference of 
computing time between the two types of partitions, also different by a superpolynomial margin to 
the best of current knowledge (i.e., if P ^ NP). 

6 Conclusions 

Relation between space and time Bentley and Yao |BY76] introduced a family of search 
algorithms adaptive to the position of the element sought (also known as the "unbounded search" 
problem) through the definition of a family of adaptive codes for unbounded integers, hence proving 
that the link between algorithms and encodings was not limited to the complexity lower bounds 
suggested by information theory. Such a relation between "time" and "space" can be found in other 
contexts: algorithms to merge two sets define an encoding for sets |AL09] . and the binary results of 
the comparisons of any deterministic sorting algorithm in the comparison model yields an encoding 
of the permutation being sorted. 

We have shown that some concepts originally defined for adaptive variants of the algorithm 
MergeSort, such as runs and shuffled sequences, are useful in terms of the compression of permuta- 
tions, and conversely, that concepts originally defined for data compression, such as the entropy of 
the sets of run lengths, are a useful addition to the set of difficulty measures previously considered 
in the study of adaptive sorting algorithms. 

Much more work is required to explore the application to the compression of permutations 
and strings of the many other measures of preorder introduced in the study of adaptive sorting 
algorithms. Figure [T] represents graphically some of those measures of presortedness (adding to those 
described by Moffat and Petersson [MP92] . those described in this and other recent work |BFN11] ) 
and a preorder on them based on optimality implication in terms of the number of comparison 
performed. This is relevant for the space of the corresponding permutation encodings, and for the 
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Figure 1: Partial order on some measures of disorder for adaptive sorting. New results are on the 



space used by the potential corresponding compressed data structures for permutations. Note that 
the reductions in this graph do not represent reductions in terms of optimality of the running time 
to find the partitions. For instance, we saw that %(vSMS)-optimality implies %(vSUS)-optimality in 
terms of the number of comparison performed, but not in terms of the running time. In terms of 
data structures, this relates to the construction time of the compressed data structure (as opposed 
to the space it takes). 

Adaptive operators It is worth noticing that, in many cases, the time to support the operators 
on the compressed permutations is smaller as the permutation is more compressed, in opposition 
with the traditional setting where one needs to decompress part or all of the data in order to support 
the operators. This behavior, incidental in our study, is a very strong incentive to further develop 
the study of difficulty or compressibility measures: measures such that "easy" instances can both 
be compressed and manipulated in better time capture the essence of the data. 

Compressed indices Interestingly enough, our encoding techniques for permutations compress 
both the permutation and its index (i.e., the extra data to speed up the operators). This is opposed 
to previous work [MRRR03J on the encoding of permutations, whose index size varied with the size of 
the cycles of the permutation, but whose data encoding was fixed; and to previous work [BHMR07] 
where the data itself can be compressed but not the index, to the point where the space used by the 
index dominates that used by the data itself. This direction of research is promising, as in practice 
it is more interesting to compress the whole succinct data structure or at least its index, rather than 
just the data. 

Applications Permutations are everywhere, so that compressing their representation helps com- 
press many other forms of data, and supporting in reasonable time the operators on permutations 
yield support for other operators. 

As a first example, consider a natural language text tokenized into word identifiers. Its word- 
based inverted index stores for each distinct word the list of its occurrences in the tokenized text, 
in increasing order. This is a popular data structure for text indexing [BYRN11, WMB99J. By 
regarding the concatenation of the lists of occurrences of all the words, a permutation tt is obtained 
that is formed by v contiguous ascending runs, where v is the vocabulary size of the text. The lengths 
of those runs corresponds to the frequencies of the words in the text. Therefore our representation 
achieves the zero-order word-based entropy of the text, which in practice compresses the text to 
about 25% of its original size [BCW90J. With 7r(i) we can access any position of any inverted 
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list, and with n we can find the word that is at any text position j. Thus the representation 
contains the text and its inverted index within the space of the compressed text. 

A second example is given by compressed suffix arrays (CSAs), which are data structures for 
indexing general texts. A family of CSAs builds on a function called ^ [GV06, Sad03, GGV03], 
which is actually a permutation. Much effort was spent in compressing ^ to the zero- or higher-order 
entropy of the text while supporting direct access to it. It turns out that $ contains a contiguous 
increasing runs, where a is the alphabet size of the text, and that the run lengths correspond to the 
symbol frequencies. Thus our representation of \t would reach the zero-order entropy of the text. 
It supports not only access to but also to its inverse 1 F _1 , which enables so-called bidirectional 
indexes [RNOM09J, which have several interesting properties. Furthermore, ^ contains a number of 
strict ascending runs that depends on the high-order entropy of the text, and this allows compressing 
it further |NM07| . 

From a practical point of view, our encoding schemes are simple enough to be implemented. 
Some preliminary results on inverted indexes and compressed suffix arrays show good performances 
on practical data sets. As an external test, the techniques were successfully used to handle scalability 
problems in MPI applications [KM WlO] , 

Followup Our preliminary results |BN09] have stimulated further research. This is just a glimpse 
of the work that lies ahead on this topic. 

While developing, with J. Fischer, compressed indexes for Range Minimum Query indexes based 
on Left-to- Right Minima (LRM) trees |FislO[ [SNIP] , we realized that LRM trees yield a technique 
to rearrange in linear time nRuns contiguous ascending runs of lengths forming vector vRuns, into a 
partition of nLRM = nRuns ascending subsequences of lengths forming a new vector vLRM, of smaller 
entropy %(vLRM) < %(vRuns) |BFN11| . Compared to a SUS partition, the LRM partition can have 
larger entropy, but it is much cheaper to compute and encode. We represent it on Figure [T] between 
"H(vRuns) and %(vSUS). 

While developing, with T. Gagie and Y. Nekrich, an elegant combination of previously known 
compressed string data structures to attain superior space/time trade-offs [BGNNlOj, we realized 
that this yields various compressed data structures for permutations ir such that the times for 7r() 
and 7T _1 () are improved to log-logarithmic. While those results subsume our initial findings [BN09J, 
the improved results now presented in Theorem [4] are incomparable, and in particular superior when 
the number of runs is polylogarithmic in n. 
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